[2178] Add data reader for offgrid inference#2426
Conversation
grassesi
left a comment
There was a problem hiding this comment.
In general I like the Decorator pattern you are employing here, since it keeps the new behavior mostly contained in one place with very few changes to the initialization logic l. I would like to this fact sharpened/highlighted more. I have just some questions (maybe I am missing context here) and a few nitpicks
There was a problem hiding this comment.
You are (silently) changing the behavior for all people using the config_forecasting.yml. If this was by mistake please revert. If this is intended, please make it a separate PR and communicate in the developer channel ahead of time.
There was a problem hiding this comment.
Good point! The config matches the current version in the develop branch, so it should merge without changes. Shall I revert the file to the previous version?
| else: | ||
| # empty source channels (needed from base class) | ||
| self.source_idx: NDArray[np.int64] = np.array([], dtype=np.int64) | ||
| self.source_channels: list[str] = [] | ||
|
|
||
| # empty target channels (needed from base class) | ||
| self.target_idx: NDArray[np.int64] = np.array([], dtype=np.int64) | ||
| self.target_channels: list[str] = [] | ||
|
|
||
| # empty target channel weights | ||
| self.target_channel_weights: list[float] = [] | ||
|
|
||
| # neutral normalization | ||
| self.mean = np.zeros(0, dtype=np.float32) | ||
| self.stdev = np.ones(0, dtype=np.float32) |
There was a problem hiding this comment.
I dont see the point of this branch. Looking at how this class is instantiated, it always wraps another DataReader. Do you plan on modifying the instantiation logic so this might happen? I would just make the ref_reader obligatory. This would also focus this design.
There was a problem hiding this comment.
Thanks for pointing this out; I made the ref_reader mandatory and removed the corresponding logic.
|
|
||
| return np.arange(perms_len) | ||
| # Setup for offgrid inference and evaluation | ||
| offgrid_eval = mode_cfg.get("offgrid_eval", {}) | ||
| # Path to .npy file containing offgrid coordinates | ||
| self.offgrid_template = offgrid_eval.get("grid", None) | ||
| # Frequency defined by config (offgrid_eval.frequency) with fallback to time_window_step | ||
| self.offgrid_frequency = offgrid_eval.get("frequency", mode_cfg.time_window_step) | ||
|
|
There was a problem hiding this comment.
Are you planning on using self.offgrid_template and self.offgrid_frequency somewhere else in this class or by any downstream code? If not please remove since this pollutes the namespace of this class.
There was a problem hiding this comment.
These attributes are only used here in the MultiStreamDataSampler, so I changed them to plain variables.
| ) | ||
| # load offgrid dataset if specified | ||
| if self.offgrid_template is not None: | ||
| filename = pathlib.Path(str(self.offgrid_template)) |
There was a problem hiding this comment.
nitpick: I dont think the str typecast is needed here.
| offgrid_eval: | ||
| # absolute path to .npy file with shape (N, 2) [lat, lon] used for offgrid inference | ||
| grid: /e/scratch/weatherai/shared_work/offgrid-test/offgrid_regular.npy | ||
| # temporal spacing between offgrid samples, e.g. 6h | ||
| frequency: 6h | ||
| # TODO add support for geoinfos | ||
| geoinfos: null |
There was a problem hiding this comment.
If I understand correctly this intends to trick the decoder into generating data on a different grid? Since it is output related, does it maybe make sense to put it under <mode>_config.output? At least I would rename offgid_eval to something like offgrid_inference to be consistent with the existing terminology used in the code.
There was a problem hiding this comment.
@clessig what is your opinion on the design choice here?
Description
Summary
Adds support for offgrid inference, enabling the model to produce forecasts at arbitrary lat/lon target locations rather than being constrained to the original data grid. Introduces a dedicated offgrid data reader, wires it into the multi-stream data sampler, and provides configs for running and evaluating offgrid forecasting.
New config keys
Adds a
test_config.offgrid_inferenceblock to the config:How to run
Implementation notes
MultiStreamDataSampler, aDataReaderOffgridis instantiated per file and stream and inherits metadata (channels and normalization) from the original data reader. This lets channels and normalization parameters be loaded from the dataset regardless of type (anemoi, obs, fesom), but the two steps can be merged if preferred.Testing
Tested with a uniform grid and with random coordinates for SYNOP. Both grid files are temporarily stored at:
/e/scratch/weatherai/shared_work/offgrid-test/offgrid_regular.npy/e/scratch/weatherai/shared_work/offgrid-test/offgrid_regular_geoinfos.npy/e/scratch/weatherai/shared_work/offgrid-test/offgrid_synop.npy/e/scratch/weatherai/shared_work/offgrid-test/offgrid_synop_geoinfos.npyIssue Number
Addresses #2178
Checklist before asking for review
./scripts/actions.sh lint./scripts/actions.sh unit-test./scripts/actions.sh integration-testlaunch-slurm.py --time 60